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. Abstract 

OO 

C A language L is closed if L = L*. We consider an operation on closed languages, 

L~*, that is an inverse to Kleene closure. It is known that if L is closed and regular, then 
L~* is also regular. We show that the analogous result fails to hold for the context- 
free languages. Along the way we find a new relationship between the unb ordered 
^ ■ words and the prime palstars of Knuth, Morris, and Pratt. We use this relationship to 

enumerate the prime palstars, and we prove that neither the language of all unbordered 
words nor the language of all prime palstars is context-free. 

1 Inverse star 

Let L be a language such that L = L*. Then, following [3J, we say that L is closed. 
Brzozowski [2 J studied the the "smallest" language M such that L = M*. 

Definition 1. For closed languages L, define 

L-* = p| S. 
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Brzozowski proved 

Theorem 2. If L is closed then (L~*)* = L. Furthermore L~* = L — L 2 . If L is regular 
and closed, then so is L~* . 

In this note we show that the class of context-free languages is not closed under the 
operation — *. First, though, we take a digression to discuss products of palindromes. 

2 Palstars, prime palstars, and unbordered words 

In this section we find a new connection between the prime palstars (as introduced in Knuth, 
Morris, and Pratt [I]) and the unbordered words. 

We start with some definitions. By w R we mean the reverse of the word w. A palindrome 
is a word w such that w = w R . In this paper we will only be concerned with the nonempty 
palindromes of even length: 

PAL = {xx R : x e £+}. 
A palstar is an element of the language PALSTAR := PAL*. 

A word x is a prime palstar if it is a palstar and cannot be written as the product 
of two palstars. Evidently a prime palstar must itself be a palindrome. The first few 
prime palstars over {0,1} are 00,0110,010010,011110,01000010,01011010,01111110, and 
their complements, obtained by mapping to 1 and vice versa. The language of all prime 
palstars is denoted PRIMEPALSTAR. 

Theorem 3 (Knuth-Morris-Pratt jl]). Every palstar has a unique factorization into prime 
palstars. 

The proof of this theorem depends on the following lemma: 

Lemma 4 (Knuth-Morris-Pratt [1]). No prime palstar is a proper prefix of another prime 
palstar. 

Corollary 5. If w is a palindrome of even length, then its factorization into prime palstars 
must be of the form w = X\X 2 • • • x n , where Xj = x n +i_j for 1 < i < n. 

Proof. Suppose w — x\ ■ • -x n is the factorization into prime palstars Xj. If n — 1 we are 
done. Otherwise, since w ends with x n , it must begin with x R = x n . Hence either x\ is a 
prefix of x n , or vice versa. By Lemma H] we must have X\ = x n . Using the same argument 
on the shorter palindrome Xi~ 1 wx^ 1 , we derive the remaining equalities. □ 

We now turn to borders. A word is said to be bordered if it has some nonempty prefix 
that is also a suffix. Otherwise, it is unbordered. Unbordered words are also called bifix-free 
in the literature [5]. 

Equivalently, a word w is bordered if it can be written in the form xyx for some nonempty 
word x. For example, entanglement begins and ends with the string ent. 

Given two words of the same length x = a\02 ■ ■ ■ a n and y = b\b 2 ■ ■ -b n , their perfect 
shuffle xUIy is defined by xllly — a\b\ • • • a n b n . 
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Theorem 6. A word w is a prime palstar if and only if there exists an unbordered word z 
such that w = zUIz R . 



Proof. Suppose w is not a prime palstar. If w is not an even length palindrome then it is 
certainly not of the form zUIz r . Suppose then that w is an even length palindrome and 
hence is of the form zUIz R . We will show that z is bordered. Since w is not a prime palstar 
we can factor w into a product of prime palstars. Then by Corollary [5] such a factorization 
must look like x ■ ■ - x for some palindrome x. Then when we "unshufHe" w into z and z R , 
we get that z starts with the odd-indexed letters of x and ends with the odd-indexed letters 
of x R . But x = x R , so z starts and ends with the same word. 

On the other hand, suppose w = xILLy. By comparing the symbols x to y we see that 
if y ^ x R , then w is not a palindrome. So assume y = x R . Now if x is bordered, then 
we can write it as x = zuz for some nonempty string z. Then w = (zuz)UI(zuz) R = 
(zUIz R )(uUIu R )(zUIz R ). This gives a factorization of w as a product of two or three 
nonempty palstars (according to whether u is empty or nonempty). □ 

An example of this theorem in English is noon, which is a prime palstar, and is the shuffle 
of the unbordered word no with its reversal. 

3 Enumeration of palstars 

As far as we know, up to now no one has enumerated the palstars. However, our argument 
above allows us to do so, based on enumeration of the unbordered words. 

Nielsen [5] has shown that if a n denotes the number of unbordered words of length n over 
an alphabet of size k, then 



(Also see p.J.) Furthermore, he showed that a n ~ Ckk n , where is a constant that tends to 
1 as k — )• oo, and C2 = .2677868. 

It follows that if b n is the number of prime palstars of length 2n, then b n = a n . In 
particular, about 27% of all binary palindromes are prime palstars. 

4 Context-free languages and inverse star 

We now apply the results in Section [2] to prove that the class of context-free languages is not 
closed under inverse star. 

Clearly PALSTAR = PAL* is context-free. We have PRIMEPALSTAR = PALSTAR"*. So it 
suffices to show that PRIMEPALSTAR is not context-free. Suppose it were. First, we need the 
following result. 




if n 



if n even; 

if n odd and > 1. 



1; 
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Theorem 7. The language U of unbordered words over an alphabet of size at least 2 is not 
context-free. 

Proof. Assume it is. Without loss of generality the alphabet is £ = {0, 1, . . .}. Consider 

U' := U n 1 + 1 + 1 + 1 + , 

the intersection of U with a regular language. Then 

U' := {1 0" 1 b 1 C 1 d : (a < d) and ((a + c) or {b < d))}. 

Since the context-free languages are closed under intersection with a regular language, it 
suffices to prove U' is not context-free. 

To do this, we use Ogden's lemma [6]. Choose 

A B c D 

z = 10 n+n! io™ +1+ri! To^ io™+ 1+ ™ ! g u', 

and distinguish the third block of 0's, the one corresponding to C. Write z = uvwxy. Then 
by Ogden's lemma vwx must contain at most n distinguished positions and vx at least one. 

If vx contains a 1, then by pumping we get a string with too many l's. Thus vx contains 
0's only, and each of v, x is contained in a single block of zeros. 

Case 1: v contains 0's from block A, and x contains 0's from block C. Then consider 
uv 2 wx 2 y = 10" 1 b 1 C 1 d . It has a' > d', a contradiction. 

Case 2: v contains 0's from block B, and x contains 0's from block C. Then consider 
uv l wx l y = 1 a 1 b 1 C 1 d , where % = (n\/\x\) + 1. Then this string has a' = d , b' > d', 
a contradiction. 

Case 3: vx contains 0's from block C. Then as in the previous case, choose i = (n\/\vx\) + 1. 
The resulting string has a' = d and b' > d', a contradiction. 

Case 4: v contains 0's from block C, and x contains 0's from block D. Consider uv l wx l y = 
1 a ' 1 b ' 1 C ' 1 d ' with i = to get a' > d', a contradiction. □ 

Now, using this result, we can prove our last result: 

Theorem 8. Over an alphabet of two or more letters, PRIMEPALSTAR is not context-free. 

Proof. Consider the morphisms g and h defined as follows: g(a) = 00, g(b) = 01, g(c) = 10, 
g(d) = 11, and h(a) = h(b) = 0, h(c) = h(d) = 1. Then the effect of ho g^ 1 is to extract the 
odd-indexed letters from an even-length word. 

Assume that PRIMEPALSTAR is context-free. Then h(g~ l (PRIMEPALSTAR)) would be context- 
free. But by Theorem [6] ^((^(PRIMEPALSTAR)) = U, the language of unbordered words, 
which we have shown in Theorem [7] to be non-context-free. □ 
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